llama-stack-0.2.23 and adjust to new APIs#148
llama-stack-0.2.23 and adjust to new APIs#148mkristian wants to merge 2 commits intocontainers:mainfrom
Conversation
Signed-off-by: Christian Meier <m.kristian@web.de>
Reviewer's GuideUpdates this distribution to llama-stack 0.2.23 / ramalama 0.17.1 and aligns configs and provider wiring with the new llama-stack APIs, including refreshed dependencies, updated provider spec types, reworked ramalama-run.yaml, and removal of legacy provider discovery and model aliases. File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @mkristian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on upgrading core dependencies, Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The dependency versions between requirements.txt and pyproject.toml are inconsistent for key packages (e.g., llama-stack is pinned to 0.5.0 in requirements.txt but 0.2.23 in pyproject.toml, setuptools is <70 vs 80.9.0), which can lead to hard-to-debug environment issues; consider aligning these pins or clearly separating dev/runtime constraints.
- Since the inline remote provider definition in providers.d was removed in favor of the new RemoteProviderSpec and ramalama-run.yaml config, it would be good to double-check that no tooling still relies on ~/.llama/providers.d, or add a migration note/compat shim if needed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The dependency versions between requirements.txt and pyproject.toml are inconsistent for key packages (e.g., llama-stack is pinned to 0.5.0 in requirements.txt but 0.2.23 in pyproject.toml, setuptools is <70 vs 80.9.0), which can lead to hard-to-debug environment issues; consider aligning these pins or clearly separating dev/runtime constraints.
- Since the inline remote provider definition in providers.d was removed in favor of the new RemoteProviderSpec and ramalama-run.yaml config, it would be good to double-check that no tooling still relies on ~/.llama/providers.d, or add a migration note/compat shim if needed.
## Individual Comments
### Comment 1
<location> `src/ramalama_stack/ramalama-run.yaml:68-74` </location>
<code_context>
config: {}
post_training:
- provider_id: huggingface
- provider_type: inline::huggingface
+ provider_type: inline::huggingface-gpu
config:
checkpoint_format: huggingface
distributed_backend: null
device: cpu
+ dpo_output_dir: ${env.DPO_OUTPUT_DIR:=~/.llama/distributions/ramalama/dpo_output}
safety:
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `inline::huggingface-gpu` while explicitly setting `device: cpu` is confusing and may be misconfigured.
Here we’re mixing a `*-gpu` provider with `device: cpu`. Please either set `device` to a GPU value (or rely on the provider default) if you want GPU execution, or revert to a non-GPU provider type if you intend to stay on CPU, to avoid configuration drift and debugging confusion.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request updates llama-stack and ramalama dependencies and adapts the codebase to their new APIs. The changes are generally good, including dependency updates, API adoption, and removal of legacy code. However, I've identified a few issues that need attention. There's a critical version mismatch for the llama-stack dependency between pyproject.toml and requirements.txt. Also, setup.py contains a bug in an exception handler that will cause a NameError. Finally, the ramalama-run.yaml configuration for the post_training provider seems contradictory. I've added detailed comments on these points.
I am having trouble creating individual review comments. Click here to see my feedback.
pyproject.toml (24)
There's a version mismatch for llama-stack. This file specifies version 0.2.23, which matches the PR title. However, the requirements.txt file lists llama-stack==0.5.0. This inconsistency can lead to unexpected dependency resolutions and should be corrected to ensure the correct version is used.
setup.py (20-22)
The logic from this except block seems to have been reused for the ramalama-run.yaml copy operation, but the variables were not updated. The new except block (on line 21 of the final file) still references providers_dir and target_dir_1, which are no longer defined, causing a NameError. The error message should reference run_yaml and target_dir_2.
For example:
except Exception as error:
print(f"Failed to copy {run_yaml} to {target_dir_2}. Error: {error}")
raisesrc/ramalama_stack/ramalama-run.yaml (69-75)
There's a configuration mismatch here. The provider_type is set to inline::huggingface-gpu, but the device is configured as cpu. This is contradictory. If a GPU is intended, the device should be configured accordingly (e.g., cuda). If only CPU is to be used, a non-GPU provider_type should be used.
944e6b0 to
b272d3f
Compare
Signed-off-by: Christian Meier <m.kristian@web.de>
b272d3f to
1222ad9
Compare
|
closing as superseeded by #149 |
update to
On the way adjusted the code to use the new APIs rom llama-stack. Fix the ramalama-run.yml to start up the container which is build here https://github.com/containers/ramalama/blob/main/container-images/llama-stack/Containerfile
In turns out these 'downloads' hacks are not needed anymore when the CMD gets adjusted.
Summary by Sourcery
Update ramalama-stack to use newer llama-stack and ramalama releases and align configuration and provider wiring with the updated APIs.
New Features:
Bug Fixes:
Enhancements: